Feature Selection and Classification of Spam on Social Networking Sites

نویسندگان

  • Antonio Lupher
  • Cliff Engle
  • Reynold Xin
چکیده

Social networking sites (SNSs) see a variety of spam and scams targeted at their users. In contrast to the limited amounts of information available beyond message text and headers when analyzing email spam, spam on SNSs is often accompanied by a wealth of data on the sender, which can be used to build more accurate detection mechanisms. We analyze 4 million private messages as well as other public and private data from a popular social network in order to gain insight into the various features of spam messages and the accompanying user accounts data available to site operators. We use these insights to choose features that best differentiate spammers from legitimate, “ham,” users. Finally, we extract these features from the site’s data and use them to train and evaluate classifiers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

Detecting Spam on Social Networking Sites: Related Work

1. RELATED WORK The rise of social media has made Social Networking Services (SNSs) more attractive targets for spam and fraud, leading to increasingly sophisticated attacks. This trend is reflected in recent research, as papers have focused on identifying and classifying the various types of social media spam. Many of these studies employ techniques previously used to combat conventional email...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

Feature-based Malicious URL and Attack Type Detection Using Multi-class Classification

Nowadays, malicious URLs are the common threat to the businesses, social networks, net-banking etc. Existing approaches have focused on binary detection i.e. either the URL is malicious or benign. Very few literature is found which focused on the detection of malicious URLs and their attack types. Hence, it becomes necessary to know the attack type and adopt an effective countermeasure. This pa...

متن کامل

A feature selection method based on synonym merging in text classification system

As an important step in natural language processing (NLP), text classification system has been widely used in many fields, like spam filtering, news classification, and web page detection. Vector space model (VSM) is generally used to extract feature vectors for representing texts which is very important for text classification. In this paper, a feature selection algorithm based on synonymmergi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012